1 Methods

1.1 DNA extraction and library preparation

DNA was extracted using the Qiagen MagAttract PowerSoil DNA KF kit (Formerly MOBio PowerSoil DNA Kit) using a KingFisher robot. DNA quality was evaluated visually via gel electrophoresis and quantified using a Qubit 3.0 fluorometer (Thermo-Fischer, Waltham, MA, USA). Libraries were prepared using an Illumina Nextera library preparation kit with an in-house protocol (Illumina, San Diego, CA, USA).

1.2 Sequencing, data curation, and sequence processing

Paired-end sequencing (150 bp x 2) was done in a NovaSeq 6000 instrument. Shotgun metagenomic sequence reads were processed with the Sunbeam pipeline. Initial quality evaluation was done using FastQC v0.11.5 1. Processing took part in four steps: adapter removal, read trimming, low-complexity-reads removal, and host-sequence removals. Adapter removal was done using cutadapt v2.6 2. Trimming was done with Trimmomatic v0.36 3 using custom parameters (LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36). Low-complexity sequences were detected with Komplexity v0.3.6 4.

High-quality reads were mapped to the human genome (Telomere-to-Telomere assembly, T2T-CHM13v2.0, GCF_009914755.1) and those that mapped to it were removed from the analysis.

The remaining reads were taxonomically classified using Kraken2 with the PlusPF database version 2022-09-265. For functional profiling, high-quality (filtered) reads were aligned against the SEED database via translated homology search and annotated to Subsystems, or functional levels, 1-3 using Super-Focus 6.

2 Quality Control Results

Sequence quality was good and host contamination was generally low. Reads removed because they matched one of the pre-specified host/contaminant genomes are grouped into the ‘Host/Contaminant’ category. Reads removed due to low complexity or quality are grouped into the ‘Low Quality’ category.

Samples from the Kangerlussuaq location had fewer reads than the other two sites, which suggests major differences in the DNA quality among the sites. For that readson, we excluded that site from the statistical analysis (PERMANOVA, differential abundance, and differences in alpha diversity).

2.1 Read quality

2.2 Read decontamination

2.3 Reads per sample

3 Taxonomy Results

Bacteria dominated the communities followed by Eukaryotes, Archaea, and Viruses accounting for 97.9%, 1.271%, 0.617%, and 0.21% respectively. Bacteria were dominated by Proteobacteria, Actinobacteria, and Bacteroidetes.

3.1 Composition at kingdom level

3.2 Bacterial composition at phylum level

3.3 Bacterial phyla #1

3.4 Bacterial phyla #2

3.5 Bacterial phyla #3

3.6 Ten most abundant bacterial species

3.7 Ten most abundant bacterial genera

3.8 Ten most abundant eukaryota species (Note: eukaryotal databases are sparse and identification can be problematic)

3.9 Ten most abundant archaeal species

3.10 Ten most abundant viral species

4 Beta diversity analysis

Taxonomic profiles (species-level) were compared using Bray-Curtis dissimilarities and samples were visualized using non-metric multidimensional scaling (NMDS).

4.1 Ordination of samples based on location

4.2 Ordination of samples based on location (Kangerlussuaq site excluded)

5 Functional profiles (Seed subsystems)

Functional genes were organized into the the SEED functional hierarchy. At broad levels (levels 1 and 2), functional profiles were very similar.

5.1 subsystem - Level 1

5.2 subsystem - Level 2

6 Analysis of variation

We use a Permutational multivariate analysis variation to estimate the effects of experimental factors on both taxonomic and functional profiles. We found significant differences due to location which explained around 84.5% of the taxonomic profiles variation and 73.8% of the functional profiles variation.

6.1 PERMANOVA taxonomic profiles

Df SumOfSqs R2 F Pr..F.
Group 1 0.815 0.845 43.492 0.014
Residual 8 0.150 0.155 NA NA
Total 9 0.966 1.000 NA NA

6.2 PERMANOVA functional profiles

Df SumOfSqs R2 F Pr..F.
Group 1 0.035 0.738 22.586 0.014
Residual 8 0.013 0.262 NA NA
Total 9 0.048 1.000 NA NA

7 Differential abundance analysis

We used Negative binomial models (DESEq2 R package) for differential abundance testing of taxonomic and subsystem level 3 features. We looked for differences between groups and found 3372 significant species and 620 differently abundant functions (adjusted p-values < 0.01).

7.1 Differentially abundant taxa (twenty most significant)

7.2 Differentially abundant functions (twenty most significant)

8 Alpha diversity

Alpha diversity was calculated from taxonomic profiles (Species level) using Shannon’s diversity index. We observed significant differences between locations (excluding the outlier, Kangerlussuaq location, Kruskal-Wallis chi-squared = 6.8182, df = 1, p-value = 0.009023).

8.1 Shannons’ diversity across groups

9 References


  1. Bioinformatics Group at the Babraham Institute. Software available at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/↩︎

  2. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2015;17:1–3.↩︎

  3. Bolger AM., Lohse M. Trimmomatic: a flexible trimmer for Illumina sequence data. Usadel B. 2014.Bioinformatics.Aug 1;30(15):2114-20. doi:10.1093/bioinformatics/btu170.↩︎

  4. Clarke, E.L., Taylor, L.J., Zhao, C. et al. Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome 7, 46 (2019) doi:10.1186/s40168-019-0658-x. https://github.com/eclarke/komplexity.↩︎

  5. Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0↩︎

  6. Silva, G. G. Z., Green K., B. E. Dutilh, and R. A. Edwards: SUPER-FOCUS: A tool for agile functional analysis of shotgun metagenomic data. Bioinformatics. 2015 Oct 9. pii: btv584. Website: https://edwards.sdsu.edu/SUPERFOCUS↩︎